A New Approach to Automatic Summarization by Using Latent Dirichlet Allocation in Conditional Random Field
نویسندگان
چکیده
A New Approach to Automatic Summarization by Using Latent Dirichlet Allocation in Conditional Random Field Xiaofeng Wu, Chengqing Zong (National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing 100190, China) Abustract: In recent years, Latent Dirichlet Allocation(LDA) has been used more and more in Document Clustering, Classification, Segmentation, and some one has used it in query based multi-document summarization, which is an unsupervised approach. LDA is recognized by its great power in modeling a document in a semantic way. In this paper we propose a new approach to extractive-based, supervised single document summarization, which called LDA based CRF (Conditional Random Field) Automatic Summarization (LCAS), by adding of Latent Dirichlet Allocation of the document as new features into a CRF summarization system. We study the power of LDA, and analyze its different effects by changing the number of topics. Our experiments show that, by adding LDA features, the result of traditional CRF summarization system can be impressively increased.
منابع مشابه
Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملThe Information Extraction Systems of PRIS at Temporal Summarization Track
This paper describes the information extraction systems of PRIS at Temporal Summarization Track. The Temporal Summarization Track includes two tasks: sequential update summarization and value tracking. For the first task, we focus attention on keywords mining and sentence scoring. The system utilizes hierarchical Latent Dirichlet Allocation (LDA) to do keywords mining and score sentences with k...
متن کاملCharacter Categorization via Latent Dirichlet Allocation for Kana Sequence Segmentation with Conditional Random Fields
We propose an efficient Kana sequence segmentation as a component of faster and easier interfaces for e-learning systems. We assign categories to Kana characters via latent Dirichlet allocation (LDA) and use the categories to compose additional features for conditional random fields (CRF). We compare the categories our method gives and those manually prepared by their efficiency in Kana sequenc...
متن کاملMulti-Conditional Learning for Joint Probability Models with Latent Variables
We introduce Multi-Conditional Learning, a framework for optimizing graphical models based not on joint likelihood, or on conditional likelihood, but based on a product of several marginal conditional likelihoods each relying on common sets of parameters from an underlying joint model and predicting different subsets of variables conditioned on other subsets. When applied to undirected models w...
متن کاملComparative Summarization via Latent Dirichlet Allocation
This paper aims to explore the possibility of using Latent Dirichlet Allocation (LDA) for multi-document comparative summarization which detects the main differences in documents. The first two sections of this paper focus on the definition of comparative summarization and a brief explanation of using the LDA topic model in this context. In the last three sections, our novel method for multi-do...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009